7 research outputs found
Meta-Learning by the Baldwin Effect
The scope of the Baldwin effect was recently called into question by two
papers that closely examined the seminal work of Hinton and Nowlan. To this
date there has been no demonstration of its necessity in empirically
challenging tasks. Here we show that the Baldwin effect is capable of evolving
few-shot supervised and reinforcement learning mechanisms, by shaping the
hyperparameters and the initial parameters of deep learning algorithms.
Furthermore it can genetically accommodate strong learning biases on the same
set of problems as a recent machine learning algorithm called MAML "Model
Agnostic Meta-Learning" which uses second-order gradients instead of evolution
to learn a set of reference parameters (initial weights) that can allow rapid
adaptation to tasks sampled from a distribution. Whilst in simple cases MAML is
more data efficient than the Baldwin effect, the Baldwin effect is more general
in that it does not require gradients to be backpropagated to the reference
parameters or hyperparameters, and permits effectively any number of gradient
updates in the inner loop. The Baldwin effect learns strong learning dependent
biases, rather than purely genetically accommodating fixed behaviours in a
learning independent manner
An Empirical Study of Implicit Regularization in Deep Offline RL
Deep neural networks are the most commonly used function approximators in
offline reinforcement learning. Prior works have shown that neural nets trained
with TD-learning and gradient descent can exhibit implicit regularization that
can be characterized by under-parameterization of these networks. Specifically,
the rank of the penultimate feature layer, also called \textit{effective rank},
has been observed to drastically collapse during the training. In turn, this
collapse has been argued to reduce the model's ability to further adapt in
later stages of learning, leading to the diminished final performance. Such an
association between the effective rank and performance makes effective rank
compelling for offline RL, primarily for offline policy evaluation. In this
work, we conduct a careful empirical study on the relation between effective
rank and performance on three offline RL datasets : bsuite, Atari, and DeepMind
lab. We observe that a direct association exists only in restricted settings
and disappears in the more extensive hyperparameter sweeps. Also, we
empirically identify three phases of learning that explain the impact of
implicit regularization on the learning dynamics and found that bootstrapping
alone is insufficient to explain the collapse of the effective rank. Further,
we show that several other factors could confound the relationship between
effective rank and performance and conclude that studying this association
under simplistic assumptions could be highly misleading.Comment: 40 pages, 37 figures, 2 table